Optimal  and  Low-complexity  Algorithms  for 
Dynamic  Spectrum  Access  in  Centralized  Cognitive 
Radio  Networks  with  Fading  Channels 


Mario  Bkassiny,  Sudharman  K.  Jayaweera,  Yang  Li 
Dept,  of  Electrical  and  Computer  Engineering 
University  of  New  Mexico 
Albuquerque,  NM,  USA 

Email:  {bkassiny,  jayaweera,  yangli}@ece. unm.edu 


Keith  A.  Avery 
Space  Vehicle  Directorate 
Air  Force  Research  Laboratory  (AFRL) 
Kirtland,  AFB,  Albuquerque,  NM,  USA 


Abstract — In  this  paper,  we  develop  a  centralized  spectrum 
sensing  and  Dynamic  Spectrum  Access  (DSA)  scheme  for  sec¬ 
ondary  users  (SUs)  in  a  Cognitive  Radio  (CR)  network.  Assum¬ 
ing  that  the  primary  channel  occupancy  follows  a  Markovian 
evolution,  the  channel  sensing  problem  is  modeled  as  a  Partially 
Ohservahle  Markov  Decision  Process  (POMDP).  We  assume  that 
each  SU  can  sense  only  one  channel  at  a  time  hy  using  energy 
detection,  and  the  sensing  outcomes  are  then  reported  to  a  central 
unit,  called  the  secondary  system  decision  center  (SSDC),  that 
determines  the  channel  sensing/accessing  policies.  We  derive  hoth 
the  optimal  channel  assignment  policy  for  secondary  users  to 
sense  the  primary  channels,  and  the  optimal  channel  access  rule. 
Our  proposed  optimal  sensing  and  accessing  policies  alleviate 
many  shortcomings  and  limitations  of  existing  proposals:  (a) 
ours  allows  fully  utilizing  all  available  primary  spectrum  white 
spaces,  (b)  our  model,  and  thus  the  proposed  solution,  exploits  the 
temporal  and  spatial  diversity  across  different  primary  channels, 
and  (c)  is  based  on  realistic  local  sensing  decisions  rather 
than  complete  knowledge  of  primary  signalling  structure.  As  an 
alternative  to  the  high  complexity  of  the  optimal  channel  sensing 
policy,  a  suboptimal  sensing  policy  is  obtained  by  using  the 
Hungarian  algorithm  iteratively,  which  reduces  the  complexity 
of  the  channel  assignment  from  an  exponential  to  a  polynomial 
order.  We  also  propose  a  heuristic  algorithm  that  reduces  the 
complexity  of  the  sensing  policy  further  to  a  linear  order.  The 
simulation  results  show  that  the  proposed  algorithms  achieve 
a  near-optimal  performance  with  a  significant  reduction  in 
computational  time. 

Index  Terms — Cognitive  radio,  dynamic  spectrum  access 
(DSA),  partially  observable  Markov  decision  processes  (POMDP), 
Hungarian  algorithm. 

L  Introduction 

PPORTUNISTIC  Spectrum  Access  (OSA)  is  emerging 
as  one  of  the  Dynamic  Spectrum  Access  (DSA)  tech¬ 
niques  that  can  mitigate  the  underutilization  of  the  spectrum 
bands.  Such  DSA  techniques  can  be  implemented  by  using 
Cognitive  Radio  (CR)  devices  which  are  supposed  to  be 
equipped  with  the  ability  to  learn  and  adapt  to  their  RE 
environment.  A  set  of  cognitive  radios  may  form  a  secondary 
network  that  coexists  with  the  primary  licensed  users  and 
shares  the  spectrum  opportunistically.  This  is  referred  to  as 
the  spectrum  interweave  which  permits  the  secondary  users 
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to  communicate  by  using  the  spectrum  holes  in  the  primary 
bands  [1]. 

In  order  to  achieve  successful  spectrum  coexistence,  how¬ 
ever,  the  cognitive  users  should  be  able  to  correctly  identify 
the  spectrum  holes  and  to  transmit  without  interfering  with 
the  primary  users  (PUs).  Of  course,  this  may  not  be  always 
possible:  The  secondary  users  can  make  wrong  decisions  about 
the  occupancy  of  the  spectrum  holes  due  to  receiver  noise 
and  fading  in  the  wireless  channels.  Sophisticated  detectors, 
such  as  the  matched  filter  and  the  cyclostationary  detector, 
may  be  employed  by  cognitive  users  for  obtaining  a  better 
estimate  of  the  primary  channels’  status,  as  we  describe  in 
the  accompanying  paper.  However,  this  would  require  some 
information  about  the  primary  signal  leading  to  additional 
complexity  at  the  cognitive  devices.  On  the  other  hand,  SUs 
are  usually  intended  to  operate  in  different  RE  environments, 
therefore,  they  are  aimed  to  detect  any  existing  primary  signal, 
irrespective  of  its  characteristics.  In  this  case,  the  cognitive 
users  do  not  assume  any  knowledge  about  the  primary  signal 
and  they  may  employ  energy  detection  as  an  optimal  tech¬ 
nique  to  perform  the  spectrum  sensing,  as  we  describe  next 
throughout  this  paper. 

A  secondary  user  can  obtain  a  better  estimate  about  the 
primary  channels  occupancy  by  basing  its  decisions  not  only 
on  the  current  but  also  on  the  the  past  observations  of  the  chan¬ 
nels,  if  the  primary  traffic  exhibits  some  temporal  correlation. 
In  particular,  if  a  channel  is  characterized  at  each  instant  to 
be  either  idle  (state  0)  or  busy  (state  1),  the  state  transitions 
may  be  modeled  as  a  Markov  chain,  and  the  optimal  sensing 
policy  can  be  obtained  by  modeling  the  system  as  a  Partially 
Observable  Marov  Decision  Process  (POMDP).  This  method 
has  been  studied  in  the  past  [2],  [3],  but  the  optimal  solution 
to  the  POMDP  is  shown  to  be  computationally  prohibitive 
because  of  the  continuum  of  the  state  space.  In  this  case,  it  is 
more  convenient  to  maximize  a  reward  function  at  each  time 
instant,  instead  of  maximizing  the  total  discounted  return,  thus 
obtaining  a  myopic  policy  for  the  POMDP  problem. 

In  this  paper,  unlike  [2],  we  assume  a  centralized  CR 
network  with  a  Secondary  System  Decision  Center  (SSDC) 
that  receives,  at  every  instant,  the  sensing  outcomes  of  all 
the  SUs  and  determines  the  sensing  and  accessing  policies  of 
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SUs  accordingly.  The  observations  of  the  SUs  are  assumed  to 
be  affected  by  independent  channel  fading  coefficients,  which 
makes  our  proposed  model  more  realistic  than  the  model  in  [3] 
since  we  take  into  account  the  spatial  and  temporal  variations 
of  the  wireless  channels.  Also,  we  do  not  restrict  all  the  SUs 
to  sense  the  same  primary  channel  at  a  time,  as  assumed 
in  [3].  This  allows  efficient  exploitation  of  all  spectrum  va¬ 
cancies,  thus  maximizing  the  network  throughput.  We  design 
the  optimal  detectors  of  the  SUs  and  we  derive  a  myopic 
channel  sensing  policy  that  maximizes  the  secondary  network 
throughput  at  each  time  instant.  However,  the  optimal  solution 
of  the  myopic  sensing  policy  is  found  to  be  computationally 
expensive  since  it  has  an  exponential  complexity.  Therefore, 
we  apply  the  Hungarian  algorithm  iteratively  in  order  to  find  a 
near-optimal  sensing  policy  in  a  polynomial  time.  The  iterative 
Hungarian  extends  the  Hungarian  algorithm  [4]  by  allowing 
more  than  one  vertex  to  be  connected  to  a  single  vertex  of  the 
other  bipartite  set,  which  is  equivalent  to  assigning  more  than 
one  SU  to  sense  a  single  primary  channel  when  the  number  of 
SUs  is  larger  than  the  number  of  channels.  We  also  propose  a 
heuristic  algorithm  that  solves  the  channel  assignment  problem 
at  a  linear  complexity  order.  The  simulation  results  show 
that  these  low-complexity  proposed  algorithms  can  achieve 
a  near-optimal  policy,  yet  with  a  significant  reduction  in  the 
computation  time. 

The  remaining  of  this  paper  is  organized  as  follows:  Section 
II  defines  the  system  model  where  we  describe  the  local  sens¬ 
ing  decisions.  Sections  III  determines  the  accessing  decisions 
at  the  SSDC,  and  Section  IV  presents  the  two  algorithms  for 
deriving  the  sensing  policy.  The  simulation  results  are  shown 
in  Section  V  and  we  conclude  this  paper  in  Section  VI. 

H.  System  model 

We  assume  a  group  of  N  SUs,  and  a  collection  of  M 
primary  channels.  The  primary  channels’  states  are  modeled 
as  statistically  identical  and  independent  two-state  (busy  and 
idle)  Markov  chains.  The  state  busy  refers  to  the  channel 
being  occupied  by  a  PU,  whereas  the  idle  state  refers  to  a 
spectrum  vacancy  which  can  be  used  by  SUs.  We  denote 
the  true  state  of  primary  channel  m  G  {I,--  -  ,M}  in  time 
slot  k  by  Sm{k)  G  {0,1}.  The  stationary  transition  prob¬ 
ability  of  channel  m  from  state  i  to  state  j  is  defined  as 
Pij  =  Pr{Sm{k  +  1)  =  j  I  Sm{k)  =  i},yij  G  {0,1}.  The 
transition  probability  matrix  of  the  Markov  chain  is  denoted 
by  P  =  boo  Poi;pio  Pii]- 

When  a  SU  successfully  accesses  a  primary  channel  that  is 
idle  during  a  given  time  slot,  the  SU  is  assumed  to  receive 
a  reward  proportional  to  the  bandwidth  of  that  channel.  If  a 
SU  accesses  a  primary  channel  in  state  busy,  it  will  cause  a 
collision  with  PUs’  transmission  and  it  gets  a  0  reward  in  this 
case.  The  accumulated  total  reward  of  all  SUs  is  used  as  a 
measure  of  the  secondary  system  throughput  over  the  primary 
channels. 

In  order  to  detect  the  spectrum  white  spaces,  SUs  perform 
spectrum  sensing.  We  assume  that  the  secondary  CRs  are 
equipped  with  only  a  single  antenna  that  switches  between 
sensing  and  actual  communication.  As  a  result,  when  a  SU  is 
performing  channel  sensing,  it  stops  its  data  communications. 
It  is  also  assumed  that  a  single  SU  can  only  sense  one 


primary  channel  at  a  time.  As  shown  in  Fig.  1,  SUs  sense 
primary  channels  during  the  designated  sensing  periods  at  the 
beginning  of  each  time  slot.  It  is  assumed  that  if  a  PU  intends 
to  use  its  channel  during  a  transmitting  period,  it  will  start  to 
transmit  from  the  beginning  of  that  time  slot.  On  the  other 
hand,  we  assume  that  multiple  SUs  can  simultaneously  sense 
the  same  primary  channel. 


Fig.  1.  Slotted  time  horizon  with  Sensing  Periods  and  Transmitting  Periods. 


At  each  time  instant  k,  the  SSDC  predicts  the  channel  fading 
coefficients  in  the  next  slot  k  + I-  Based  on  these  coefficients 
and  on  the  belief  of  the  channels’  state  in  the  next  time  slot, 
the  SSDC  computes  the  sensing  decisions  for  time  k+1.  Then, 
each  SU  senses  its  assigned  channel  and  it  reports  its  sensing 
outcome  to  the  SSDC  which  decides  which  channel  to  access 
at  time  k+l.  The  access  is  scheduled  among  secondary  users 
such  that  it  guarantees  equal  spectrum  opportunities  for  all 
SUs. 

We  represent  the  sensing  decision  by  the  MxN  matrix  A^, 
where  Afc(m,  n)  G  {0, 1}.  The  secondary  user  n  should  sense 
channel  m  at  time  k  only  if  Ak{m,n)  =  1.  Similarly,  we 
define  the  MxN  matrix  to  denote  the  accessing  decision 
at  time  k. 

We  use  MxN  matrix  to  denote  the  collection  of 
observation  results  from  all  SUs  on  their  assigned  primary 
channels  at  time  k  with  Yk{m,n)  =  ym,n{k),  where  ym,n{k) 
is  used  to  denote  the  report  from  SU  n  to  the  SSDC  of  the  state 
of  m-th  primary  user  at  time  k.  The  SSDC  uses  the  entries 
Yfc(m,n),  such  that  Afe(m, n)  =  1  in  order  to  make  the 
access  decisions  at  time  k.  The  decision  making  architecture 
is  summarized  in  Algorithm  1. 


Algorithm  1  Decision  making  architecture 

1.  At  each  time  k,  based  on  previous  knowledge  of  primary 
channels  and  channel  observations,  the  SSDC  sends  out  the 
sensing  decisions  Aj,  to  all  SUs. 

2.  SUs  perform  channel  sensing  according  to  A^  and 
sensing  result  Y^  is  reported  back  to  the  SSDC. 

3.  Based  on  the  channel  sensing  result  Y^,  SSDC  sends  out 
the  accessing  decisions  Bfc  to  all  SUs. 

4.  SUs  access  primary  channels  according  to  B^. 

5.  For  k  ^  k  +  1,  repeat  1  through  5. 


When  sensing  a  channel  m  at  time  k,  the  SU  n  gets  the 
observation  rm,n{k)  defined  as: 


hm,n  (k)  (k)  +  Wn{k) 

Wnik) 


if  T{o:^m(fc)  =  l 
if  Tfi  :  S^{k)  =  0 
(1) 


where  Xm{k)  is  the  transmitted  primary  signal,  Wn{k)  is  a 
zero-mean  Gaussian  noise  with  variance  cr^,  and  hm,n{k)  is 
the  fading  coefficient  between  the  m-th  primary  transmitter 


and  the  n-th  secondary  receiver  at  time  k.  The  channel 
coefficient  hm,n{k)  is  assumed  to  be  zero-mean  Gaussian 
distributed  with  variance  We  assume  that  the  SSDC  has 
perfect  knowledge  of  all  channel  coefficients  at  each  time  k. 
Since  the  SU  does  not  have  knowledge  about  the  primary 
signal,  we  model  Xm{k)  as  a  zero-mean  Gaussian  random 
variable  with  variance  a^. 

Instead  of  transmitting  the  observation  rm,n{k)  to  the 
SSDC,  we  assume  that  SUs  report  an  estimate  of  the  primary 
channel  state  Sm{k),  based  on  the  observation  rm,n{k).  The 
state  estimate  is  denoted  by  ym,n{k)  and  it  is  obtained  by  using 
a  maximum  a  posteriori  (MAP)  detector  for  the  observation  in 
(1).  Therefore,  Sm{k)  G  {0, 1}  and  ym,n{k)  G  {0, 1}  can  be 
modeled,  respectively,  as  the  input  and  output  of  a  Binary 
Asymmetric  Channel  (BAG)  having  crossover  probabilities 
X^{m,n)  and  Xl(m,n)  under  hypotheses  TCi  :  {Sm(k)  =  0} 
and  TCq  :  {Sm(k)  =  1},  respectively,  as  we  illustrate  in  Fig.  2. 
We  assume  that  transmitting  ym,n(k)’s  to  the  SSDC  is  error 
free. 


Primary 
transmission 
on  channei  m 


I 


...(*) 


H,:SJk)  =  0  - - ►  .F„,,(*)=0 


1-C.W 


Fig.  2.  SUs’  reports  of  observations  on  primary  channels  can  be  modeled 
as  Binary  Asymmetric  Channels. 


The  State  estimation  ym,n{k)  G  {0,1}  is  given  in 
(2)  by  using  a  MAP  detector  [5]  such  that  ym,n{k)  = 
argmaXig{o,i}  Pi‘{>5'm(fc)  =  Then, 


ym,n{k) 


0  if  rl{m,n)  <r)!^^„{k) 

1  if  rl{m,n)  >  r]'^  „{k)  ’ 


where 


In  -  21n(p„(A:)) 

hl(rn,n)al  [{hl{m,n)cjl  +  al)cjl\ 


(2) 


(3) 


and  rjmik)  =  ■  From  (2),  in  this  case,  the  MAP 

detector  is  an  energy  detector  when  Xm,n{k)  is  assumed  to  be 
a  Gaussian  random  variable. 

By  noting  that  the  random  variables  -^^^3 —  and 

(k)  9  1  1-  -1  • 

0-2  n)(7^  nave,  a  %  -squared  distribution,  we  may  com¬ 

pute  tlie  crossover  probabilities  of  the  BAC  sensing  model  as: 


Am.n(fc)  =  1  - 


r(5) 


/I 

U’  2t72  J’ 


1  A 

r(i)^  V2’  2(cr2  +  hl{m,n)al) 


(4) 

(5) 


where  r(a;)  and  ^{a,b)  stand  for  the  Gamma  and  the  lower 
incomplete  Gamma  functions,  respectively  [6]. 


III.  Centralized  Access  Decisions  at  the  SSDC 

In  order  to  keep  the  above  collision  probability  with  PUs 
below  a  certain  threshold,  we  apply  a  Neyman-Pearson  type 
detector  [5]  at  the  SSDC  to  obtain  the  access  decision  rule. 
For  simplicity,  we  use  the  variable  length  vector  yfc(m, :)  = 
{ym,n{k)  :  Vn  G  3Nrm(A:)}  to  denote  all  channel  sensing  reports 
at  time  k,  from  the  SUs  on  channel  m.  We  define  the  variable 
length  vector  yo:fe(m,:)  =  {yo(m,  :),••  •  ,yfc(TO, :)}  to  denote 
the  sensing  results  on  channel  m,  from  time  0  to  k.  We  use 
vector  to  denote  the  the  states  of  channel  m  from  time 
0  to  k.  The  set  of  all  possible  state  vectors  is  denoted  by 
§e={0,l}'=+l. 

At  time  k,  for  the  m-th  primary  channel,  the  SSDC  chooses 
one  of  the  two  possible  hypotheses  based  on  yo:fc(m, :): 

Tfi  :  yo:fc (to,  :)  ~  (channel  iW/e) 

^0  ■  yo:fc(TO, :)  ~  Pm,Q  (channel  busy), 


where  Pm,i,  and  Pm,o  denote  the  conditional  probability 
density  of  the  vector  yo:fe(TO, :)  given  Sm{k)  =  0,  and 
Sm{k)  =  1,  respectively.  The  corresponding  likelihood  ratio 
based  on  yo:fe(TO, :)  is  complicated  and  hard  to  derive  in 
general  because  the  length  of  the  sequence  yo-kitn, :)  increases 
with  time.  To  simplify  the  access  decision  structure,  we 
assume  that  the  access  decisions  are  based  only  on  the  current 
observations  yk{m,:).  Then,  for  the  TO-th  primary  channel, 
the  likelihood  ratio  is  defined  as: 


^{yk{m,:))  = 


PmAykjm, :)) 
Pm,o{yk{m,:))’ 


(6) 


where  we  reuse  the  notation  Pm,i,  and  Pm,o  to  denote 
the  conditional  probability  density  of  vector  yk{m,:)  given 
Sm{k)  =  0,  and  Sm{k)  =  1,  respectively. 

The  corresponding  log-likelihood  ratio  is  given  by 

Ci:3^(yfc(TO, :))  =  X:  n&ytrnik)  y-m.n  (k)  C.m,n  (k)  +  dm  ik). 


where  we  define 


,{k)  =  In 

l-A, 


^{k) 


neNm(fc) 


In 


-A“, 
ik) 


V  )■ 


(k) 

~Jk)  ■  1-Xb,,„(k) 

A  sufficient  statis- 


and  dm(k}  = 

tic  is  J2nG'N„,ik)ym,n{k)cm,n{k).  In  Other  words,  the  test 


UUtR(yfe(TO, :))  Tm{k)  is  equivalent  to  the  test 

“  >Mi 


^neTtmik)  ym,n{k)Cm,nik)  ~  dm{k)  =  T^(fc). 

We  use  j.,  and  f.  to  denote  the  conditional 
probability  mass  function  (pmf),  and  the  conditional  cu¬ 
mulative  distribution  function  (cdf)  of  random  variable 
Y.n(iyir„{k)ym,n{.k)cm,n{.k)  Under  hypothesis  Tfo,  respec¬ 
tively.  Similarly,  we  use  A  k’  A  to  denote  the  con¬ 
ditional  pmf,  and  the  conditional  cdf  of  random  variable 
SnGN  (k)  ym,n{k)cjn,n{k)  Under  hypothesis  Tfj,  where  j  G 
{0,1}.'” 

We  use  (  to  denote  the  collision  probability  constraint  on 
each  individual  primary  channel.  So  A(fc)  is  chosen  such  that: 

<  C  <  ^-P^,kiTLik)  +  ^)-  The  randomized 
access  decision  rule  is  then  given  by 


{1  if  ym,n (^)  ^  An.(^) 

lm{k)  if  J2ym,n{k)Cm,n{k)  =  T^{k) 

0  if  E  Vm^n  (fc) 

Cm,n  (fc)  <  T^{k) 


where  the  summations  are  with  respect  to  n  G  and 

SNpiyki'm-,-))  is  the  probability  of  accessing  channel  m. 
Therefore,  the  SSDC  decides  to  access  channel  m  only  if 
SNpO^kim,-))  =  1,  where  5np  is  a  binomial  random  variable 
with  a  probability  of  success  equal  to  S^p.  The  randomization 

variable  is  given  by  7^(fc)  =  jr;,(fc)-’i)-F»  Jr;„(fc))  ■ 

the  probability  of  detection  of  the  white  spaces  is  equal  to: 

PF,n^(fc,Afe)=Pr{(5wF  =  l|Tfl}  (7) 

=  1  -  +  lm{k)  ■ 

(8) 

We  will  use  the  probability  of  detection  in  the  derivation  of 
the  sensing  decisions  at  the  SSDC,  as  we  show  next. 


IV.  Centralized  Sensing  Policy  at  the  SSDC 


A.  Optimal  Myopic  Channel  Sensing  Policy 

The  objective  of  designing  the  sensing  decision  rule  is  to 
maximize  the  total  secondary  system  reward  on  all  channels 
accrued  over  time.  To  do  this,  we  first  define  bo{m,k)  = 
PT{Sm{k)  =  0  I  yo:fe_i(TO, :)},  and  bi{m,k)  =  l-bo{m,k) 
as  the  belief  of  channel  m  being  idle  and  busy  at  time 
k,  given  the  observation  history  on  channel  m  up  to  time 
k  —  1,  respectively.  We  define  the  belief  vectors  of  idle  and 
busy  as:  bo(A:)  =  Tbo{M,k)]'^  and  bi(A:)  = 

k),  -  ■  ■  ,  bi{M,  k)]’^ .  At  time  k,  after  obtaining  the  sens¬ 
ing  observations  from  all  SUs,  the  belief  of  the  channel  m 
being  idle  in  next  time  slot  fc  +  1  is  updated  at  the  SSDC 
using  Bayes’  formula: 


.  ,  EiG{0,l}^i(’^7^)P*OnnGN,„(fc)/*(2/™.n(fc)) 

E.G{o.i}^*KA:)n„GN,„w/*(y-.n(fc))  ’ 

(9) 

where  f^{ym,u{k))  =  Pr{2/m,n(fe)  I  Sm{k)  =  i},Vi  G  {0,1} 
is  the  conditional  pmf  of  SUs’  observations.  For  the  unsensed 
primary  channels,  the  belief  is  updated  based  on  the  Markovian 
evolution  of  primary  channels:  [bi{m,  fc+1),  l—bi{m,  fc+1)]  = 
[bi{m,k),l  —  bi{m,k)]'P,  where  P  is  the  transition  matrix. 
The  belief  vectors  bo(l),  and  bi(l)  are  initialized  with  the 
stationary  distribution  tt  =  [fq  fi]  of  the  Markov  model  given 
by  F  =  fP. 


The  reward  function  for  channel  to,  at  time  k  is  defined 
as:  rm{k,Ak)  =  B^3[Sm{k)=o}‘^{SNP=i},  where  we  define 
Bm  as  the  bandwidth  of  channel  to  and  5f  =  1  if  condition 
E  is  satisfied,  and  If  =  0  otherwise.  The  expected  reward 
for  channel  to  at  time  k  is  then  given  by  E{rm(fc,  A^)}  = 

Bmb{){pi^  k)Pp)  p^(^kj  A/j;). 

We  define  the  vector  S(fc)  =  [S'i(fc),---  G  §  as 

the  state  of  the  system  at  time  k.  When  the  SUs  do  not  have 
perfect  knowledge  of  the  states  of  the  primary  channels,  the 
actual  state  of  the  system  is  the  belief  vector.  Smallwood  and 
Sondik  have  provided  in  [7]  an  algorithm  to  obtain  the  optimal 
decisions  for  this  Partially  Observed  Markov  Decision  Process 
(POMDP)  problem.  With  a  large  number  of  primary  channels, 
the  algorithm  requires  very  high  computational  complexity  and 
the  solution  is  often  intractable  [2]. 

As  an  alternative,  a  myopic  channel  sensing  decision  can 
be  defined  to  maximize  the  total  secondary  reward  over  all 
primary  channels  at  each  time  step.  The  resulting  sensing 


policy  is  different  from  the  optimal  POMDP  solution  because 
it  does  not  maximize  the  sum  of  the  discounted  rewards 
accrued  over  time  starting  from  each  time  step.  That  is,  the 
myopic  solution  can  be  considered  as  a  suboptimal  solution 
to  the  POMDP  problem.  The  myopic  sensing  decision  can 
be  expressed  as: 


M 

Al  =  argmax  'V'  Bmbo{m,k)PD  rn{k,  At),  (10) 

Afc 

m—1 

subject  to  I]m=iAfc(TO,n)  =  1,  where  PD,m{k,Ak)  is 
defined  in  (7).  In  this  case,  the  sensing  decision  (obtained 
from  (10))  is  the  optimal  solution  to  the  myopic  sensing  policy, 
which  we  refer  to  as  the  optimal  myopic  solution.  This  solution 
can  be  obtained  by  listing  all  combinations  of  matrix 
Afc  and  picking  the  optimal  solution  that  maximizes  (10). 
However,  due  to  the  complexity  of  this  method,  we  propose 
two  different  methods  that  compute  suboptimal  solutions  to 
the  myopic  policy  and  that  have  at  most  polynomial  order 
complexities. 


B.  Iterative  Hungarian  algorithm  for  channel  sensing  policy 

We  propose  a  suboptimal  algorithm  for  solving  (10)  by 
applying  the  Hungarian  algorithm  iteratively.  For  simplicity, 
we  drop  the  time  indices  from  the  algorithm  description  and 
we  let  Bjn  =  1.  We  assume  that  the  crossover  probabilities  of 
the  BACs  and  the  false  alarm  probability  are  given.  We  define 
the  M  X  A  matrix  such  that  A(™’”)(to',  n')  =  1  if 

{m',n')  =  {m,n),  and  A(™’"')(to',  n')  =  0  otherwise.  Then, 
we  use  Algorithm  2  to  find  the  channel  sensing  assignment 
A,  which  provides  a  suboptimal  solution  to  (10). 

We  note  that  the  complexity  of  the  Hungarian  algorithm  [4] 
is  (max{M,  N})^  for  an  M  x  N  bipartite  graph.  Therefore, 
the  complexity  of  the  proposed  iterative  Hungarian  algorithm 
is  in  the  order  of  (max{M,  A})^  since  the  Hungarian 
algorithm  is  computed  iteratively  [ times.  In  brief,  the 
proposed  algorithm  can  solve  the  sensing  channel  assignment 
with  an  order  4  polynomial  complexity. 

In  particular,  if  A  <  M,  Algorithm  2  is  equivalent  to  the 
Hungarian  algorithm  which  provides  the  optimal  solution  to 
(10)  in  this  case. 


Algorithm  2  Iterative  Hungarian  Algorithm 

A  =  Omxa  and  A  =  (1,  •  •  •  ,  A} 

while  A  ^  0  do 

AP  =  Omxn 

for  TO  G  (1,  •  •  •  ,  M}  and  n  G  A  do 

AP(to,  n)  =  [PD,m  (A  +  A(™'-))  -  Pf.™  (A)]  bo{m) 

end  for 

Run  the  Hungarian  algorithm  for  the  M  x  N  bipartite 
graph  whose  edge  weights  are  given  in  AP  to  obtain  the 
maximum  sum  matching. 

Remove  the  assigned  vertices  from  the  set  A. 

Append  the  new  assignments  to  matrix  A. 
end  while 


Time  step  =  1000.  ^=0.1 


C.  Heuristic  algorithm  for  channel  sensing  policy 

We  propose  next  a  heuristic  algorithm  that  permits  to  reduce 
the  complexity  to  a  linear  order  in  function  of  the  number  of 
secondary  users  N.  The  algorithm  picks  randomly  a  secondary 
user  n  and  assigns  it  to  the  channel  m  for  which  it  has  the 
highest  detection  probability.  Also,  we  allow  at  most  [ 
to  sense  each  channel  so  that  the  SUs  sense  evenly  all  the 
channels  and  keep  accurate  information  about  the  belief  of 
every  channel  state.  A  description  of  the  proposed  heuristic 
sensing  method  is  given  in  Algorithm  3,  in  which  we  drop  the 
time  indices  for  simplicity. 


Algorithm  3  Heuristic  Sensing  Assignment 

A  =  Omxn  and  N 

,N}. 

while  N  ^  0  do 

Pick  randomly  n 

G  N. 

m*  =  arg  max,„, 

,M} 

EnGfl.  - 

/V}  A(m, 

n)  <  \^^ 

A-f 

.n] 

end  while 

V.  Simulation  Results  and  Discussions 


We  show  in  the  simulations  the  average  utilization  of  the 
spectrum  holes  as  a  function  of  the  average  SNR  at  the 
secondary  detectors.  We  define  the  average  utilization  of  the 
spectrum  holes  as: 


U  = 


Ell  Er=i  n)(l  -  SUk)) 


E“=iEr=i(l-^-(fc)) 


(11) 


where  Bfc(m,n)  is  the  accessing  decision  such  that 

EliB.(  m,  n)  <  1,  meaning  that  at  most  one  SU  can 

access  a  channel  at  each  time  instant  if  it  is  found  to  be 

idle.  The  average  SNR  at  the  n-th  secondary  detector  when 

^2 

sensing  channel  m  at  time  k  is  equal  to:  SNR  = 
and  we  assume  that  the  fading  coefficients  hm,n{k)  to  be 
independent  identically  distributed  (i.i.d.)  standard  Gaussian 
random  variables.  The  primary  channels  are  assumed  to  have 
independent  Markovian  evolutions  and  having  the  transition 
matrix: 


P  = 


0.9  0.1 
0.8  0.2 


(12) 


We  compare  the  average  utilization  of  the  spectrum  holes 
that  is  obtained  by  using  the  three  different  methods.  In  order 
to  compare  with  the  optimal  myopic  solution,  the  values  of 
M  and  N  are  not  chosen  too  large  because,  in  that  case,  the 
optimal  solution  becomes  intractable.  In  Fig.  3,  we  observe 
that  the  performance  of  the  iterative  Hungarian  algorithm  is 
close  to  the  optimal  myopic  solution  at  low  and  high  SNR’s. 
However,  the  performance  of  the  heuristic  algorithm  is  close  to 
the  other  two  algorithms  only  in  the  low  SNR  region.  We  note 
that  the  average  utilization  converges  to  =  0.1  at  low  SNR, 
which  conforms  with  the  Receiver  Operating  Characteristics 
(ROC)  of  the  Neyman-Pearson  detector  which  becomes  linear 
at  low  SNR  [5],  thus  making  the  detection  probability  equal 
to  the  false  alarm. 


Fig.  3.  Average  Utilization  of  the  spectrum  holes 


Note  that,  when  M  =  N  =  5,  and  referring  to 

Section  IV,  the  computational  complexity  is  reduced  by 


M" 


r^1(max{M,JV})3  =  25  and  —  =  625  times  when  applying 
the  iterative  Hungarian  and  the  heuristic  algorithms,  respec¬ 
tively,  as  compared  to  the  optimal  myopic  solution. 


VI.  Conclusions 

In  this  paper,  we  presented  a  centralized  spectrum  sens¬ 
ing  and  accessing  protocol  for  SUs  in  a  CR  network.  We 
considered  a  more  realistic  CR  network,  compared  to  those 
that  have  been  assumed  in  previous  DSA  designs,  by  taking 
into  account  the  spatial  and  temporal  variations  of  channel 
fading  coefficients  on  the  different  primary  channels.  We 
derived  the  optimal  access  decision  strategy  and  the  optimal 
sensing  decisions  for  a  myopic  policy  assuming  a  centralized 
decision-making  architecture.  As  an  alternative  to  the  high 
complexity  of  the  optimal  myopic  channel  sensing  policy,  we 
proposed  two  algorithms  for  obtaining  near-optimal  policies: 
The  first  based  on  the  iterative  Hungarian  algorithm  and 
it  has  fourth-order  complexity  while  the  second  algorithm 
is  based  on  a  heuristic  method  and  has  linear  complexity. 
The  simulation  results  showed  that  the  two  proposed  low- 
complexity  algorithms  achieve  a  performance  very  close  to 
the  optimal  solution,  but  with  a  much  smaller  computational 
effort. 
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